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Abstract 

In this article, we introduce a conditional marginal model for longitudinal 
data, in which the residuals form a martingale difference sequence. This 
model allows us to consider a rich class of estimating equations, which 
contains several estimating equations proposed in the literature. A par- 
ticular sequence of estimating equations in this class contains a random 
^ ' matrix 7?.*_i(/3), as a replacement for the "true" conditional correlation 

("^ I matrix of the i-th individual. Using the approach of [12) . we identify some 

0^ ■ sufBcient conditions under which this particular sequence of equations is 

^^ ' asymptotically optimal (in our class). In the second part of the article, 

Cn ^ we identify a second set of conditions, under which we prove the existence 

r~>., . and strong consistency of a sequence of estimators of /3, defined as roots 

("^ ' of estimation equations which are martingale transforms (in particular, 

OO , roots of the sequence of asymptotically optimal equations) . 

Keywords: longitudinal data; generalized estimating equation; optimal param- 
eter estimation; strong consistency. 
AMS Classification: Primary 62F12; Secondary 62J12. 



1 Introduction 

1.1 Background 

Longitudinal data sets are frequently used in biostatistics, economics, as well as 
in educational or environmental studies, when the individual measurements are 
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recorded over time. Since in most applications, the individual measurements 
are influenced by a set of explanatory variables (e.g., age, family income, years 
of post-secondary education, etc.), the standard approach to longitudinal data 
analysis is based on a marginal regression model with unknown parameter (3, 
which controls the effects of the explanatory variables. A study of longitudi- 
nal data is technically more demanding than a classical study of cross-sectional 
data, since it might have to include a separate model for the (unknown) cor- 
relation structure within the individual measurements. Such a study usually 
involves higher costs, but has the advantage of providing the researcher with 
the opportunity to control the unmeasured heterogenity among the response 
variables. In the most complex longitudinal scenarios, the observations are un- 
balanced (i.e. there is a different number of observations for each individual), 
the observational times depend on the individual, and the presence of some ran- 
dom/fixed effects specific to each individual cannot be ignored. We refer the 
reader to the monographs [5] , [11] and [25 for a comprehensive account on this 
subject. 

There are many approaches which have been proposed in the literature for 
treating the unknown correlation/covariance matrix within the individual re- 
sponses. We discuss very briefly the salient features of some of these approaches. 

The estimation of a covariance function in the context of correlated data 
(in particular longitudinal data) is an important statistical problem, for which 
various solutions have been proposed in the literature, using non-parametric 
methods [26], penalized likehhood methods [13], and methods borrowed from 
functional data analysis (see [28], [29]). More recently, the authors of [8] (see 
also |25j ) have proposed a semi-parametric random-effect model for unbalanced 
time-dependent longitudinal data: 

V,{t)^^,{tfp + 5,{tfa{t)+e,{t), i = i„-,j = l,...,m,. (1) 

In this model, the response yi [t) of the i-th individual at time t depends linearly 
on a p-dimensional covariate vector Xi(i) (through the value of a regression 
parameter (3) , and a d-dimensional random-effect vector 5i (t) (through the value 
of a smooth function ot(t)). The covariance structure within the individual 
responses is given by an unknown function fT^(i) := Var(e(t)|x(i), (5(i)) and a 
family p(s,t\6) := Corr(e(s), £(i)), which depends on a parameter 0. The joint 
estimation of the correlation matrix and the regression parameter is achieved by 
iterating between the estimation of (o'^(i), 0) and (a(i), /?), using a combination 
of non-parametric and parametric techniques. 

Other models related to ^ have been considered by various authors. With- 
out aiming to exhaust the list of contributions in this active area of research, we 
mention briefly [30j (which underlies the connection with survival analysis), and 
[19] (motivated by an application in educational studies). In [30], the function 
a{t) is replaced by a normal random vector a which models the unobserved 
subject-speciflc effects, and the authors achieve a joint estimation of various pa- 
rameters describing the "marker process" y{t) and the survival time T. On the 
other hand, the authors of [TU' considered the random-effect model (for balanced 



data) yi — 'Ki(3 + A5i +ei, where X,; — (x^i, . . . , 'x.imY' , A is an m x d matrix of 
time-invariant individual parameters, and m is the number of observations per 
individual. 

The most popular approach for treating the unknown correlation structure 
(within the individual measurements) has its origins in [T^, and has been em- 
braced very quickly by the scientific community at large. It is now implemented 
in many statistical software. This approach is based on the marginal model: 

2/y = M(xy /?) + £jj , j = 1, . . . , m,. (2) 

The main idea is to replace the true correlation matrix of the i-th individual, 
by a correlation matrix Ri(a), which depends on a parameter a, and achieve 
a joint estimation of {a, (3) by iterating between the estimation of a and j3. 
The usual recipe for the estimation of a is based on the method of moments. 
For the estimation of /?, the authors of [17] suggest a quasi-likelihood method, 
inspired by the appealing similarity between a generalized linear model and the 
marginal model ([2]). This involves solving for /3 in the generalized estimating 
equation (GEE): 

n 

g„(/3) :=^D,(/3)^Vri(/3,a)(y, -^,(/3)) =0, (3) 

where D,(/3) = d ^i,{|3) / d P^ , fx,{(3) = (Ai,i(/3), . . . ,M^r„.(/3))^ and M^,(/3) = 
E{y,j) = m(x^/3). In this equation, V^{J3,a) = A,(/3)i/2R,(a)A,(/3)i/2 is a 
"working covariance" matrix, which is obtained from the matrix Ri(a) and the 
diagonal matrix Ai(/3), built from the marginal variances of A (3) — Ya.r (yij) = 
fi'{xfj(3). We refer the reader to [27] for an extensive theoretical study of various 
asymptotic properties of a different GEE estimator, as well as to [5] and [3T] for 
similar studies of GEE's. 

The appealing feature of the approach proposed in [17] is that the estima- 
tion of f3 is "derived without specifying the joint distribution of the a subject's 
observations" . Its drawback is that it requires a correct modelisation of the true 
correlation matrix from the very beginning. 

One solution to this problem has been proposed recently in [T^, using the 
theory of optimal parameter estimation, initiated by [5] and described at length 
in [12]. The model considered in [14] is written in semi-parametric form: 

y^j = .9j(Xj, /9) + e.y, j = 1, , . . . , m,, 

the term fJ,ij{f3) = gj{'K.i,(3) incorporating the subject-specific random effects. 
The authors of [13] propose an iterative procedure for the joint estimation of 
(5 and the true covariance matrix S^. The resulting iterative estimating equa- 
tions (lEE) algorithm alternates between the estimation of /3, and the method 
of moments estimation of Si, converges exponentially fast, and yields consistent 
estimates for both (3 and S^. Most importantly, the lEE estimators for /3 pro- 
duced by this algorithm, are (asymptotically) as efficient as the optimal GEE 



estimators, defined as solutions of the equations: 

Y,D^i(3f^7\y^ - ^^^m ^ 0, n>i. (4) 

i=l 

1.2 Our contribution 

In the present article, we consider a marginal model for balanced data, similar 
to ([2]), in which the covariates are non-random and there are no subject-specific 
random-effects. To simplify the presentation, we assume that the observational 
times are equally spaced and do not depend on the subjects. Our goals are to 
identify an asymptotically optimal equation (within an appropriate class), in 
the sense that it produces an estimator /3„ which has minimum variance. 

The model that we consider allows for some degree of dependence among 
the individual responses. In particular, our model includes the case in which 
the individual responses are independent. More precisely, we assume that the 
conditional mean and variance of the response ijij of individual i at time j , given 
the "previous" responses yi, . . . ,yi-i does not depend on these responses, and 
can be directly expressed using some explanatory variables x^ (through the 
regression parameter /3) and a link function fi. 

This theoretical relaxation has been inspired by the classical work [16j in the 
case of linear regression, in which the residuals are not necessarily independent, 
but form a martingale difference sequence. With these residuals as building 
blocks, we construct a class of estimating equations, which form a transform 
martingale family (see subsection 13.1.2 of [H]). The choice of this class leads 
us to an appropriate -and very general- class of estimating equations, in which a 
sequence of asymptotically optimal equations can be found. Although we follow 
the approach and use the techniques developed in |12| , our application does not 
seem to have been covered in |12j . 

After the appropriate class of estimating equations has been selected (see p. 
200 of [12), the initial step of our investigation is to find the "optimal" equation 
(within this class). The method that we use for achieving this goal is different 
from the one of [14], since we allow that both the true conditional covariance 
matrix Si and the true correlation matrix Ri depend on the parameter /3. We 
find that the optimal estimating equation is: 

n 

g„(/3) := 5]D,(/3)^E,(/3)-i(y, -/.,(/?)) = 0, (5) 

which is significantly different from equation ([5]): even in the case of linear re- 
gression (i.e when ^{x) — x), equation ([5]) cannot be solved explicitly, whereas 
(IH) has a closed form solution, given by /3„ = (Z]"=i '^I^i^'^^)~^ SiLi ^f ^i~ Vi 

Next, we are interested in identifying an asymptotically optimal estimating 
equation, by means of a comparison with the optimal equation. To do this, 
we introduce a random matrix TZ*_i{/3) as an estimator of the true conditional 



correlation matrix Ri(/3). Under some conditions on the matrix TZ*_i{(3), we 
find that the following sequence of equations is asymptotically optimal: 

n 

g:(/?):=5]D,(/3rV*(/3)-i(y.-A*.(/3))=0, n>l (6) 

where V*(/3) = Ai(/3)^/^7?.*_i(/3)Ai(/3)^/^. A particular example of random 
matrix 7?.* (/3) , which can be used when the correlation structure is the same for 
all individuals, is: 

1 " 

n ^ — ^ 



n 

i=l 



Computationally, solving this new equation may require more effort than ([3]) or 
(HI, but it has the advantage of taking into account the correlation structure 
embedded in the data, without considering an additional model for this struc- 
ture. An additional level of technical difhculty comes from the substitution of a 
random correlation matrix in ([6]) , which renders the assumption of independence 
of residuals useless for the purpose of asymptotic analysis. These considerations 
have lead us to considering a class of transform martingales, which is required 
even in the case of independent individuals. 

Most of the GEE literature deals with weak consistency of estimators. A 
result regarding the more difficult problem of strong convergence of the GEE 
estimators appears in 27J. The fortuitous choice of the above-mentioned class of 
estimating equations enables us to tackle this topic within a martingale frame- 
work and allows us to avoid using complicated approximation techniques (see 
Appendix A2, P]). In the second part of the article (Section 4), we complete our 
analysis by specifying the sufficient conditions under which equation ([6]) can be 
solved, yielding a sequence of strongly consistent estimators of (3. In addition 
to the previous example, this analysis applies also to the case when 

nli(3) :=7^„ = 1 VA,(^„)-i/2£,(^„)e,(/3„)^A,(^„)-i/2. (8) 

n ^-^ 

Here {/?«}« is a strongly consistent sequence of estimators, defined as roots of 
the "working independence" equation: 

n 

This article is organized as follows. In Section [51 we introduce the general 
framework. In Section [3l we first prove that equation ([5]) is optimal in a certain 
class of estimating equations; then, we show that the sequence ([6]) of estimating 
equations is asymptotically optimal within the same class (Theorem 13. 9|) . The 
main result of Section |4] (Theorem I4.13P identifies the conditions under which 



there exists a strongly consistent sequence {/3„}„ of estimators of /3, defined as 
roots of equations ([6]). 

Theorem 13.91 applies only to equations ^ corresponding to the sequence 
{7?.*(/3)}„ given by ([7]). On the other hand, Theorem 14. 1 31 applies to equations 
([2]) and ^ (although these equations are not asymptotically optimal), as well 
as equations ^ corresponding to sequences {7^^(/3)}n given by ([7]) or ([5]). 

Some technical proofs which are needed in Section |4] are included in the 
appendices: Appendix A gives the formula for the calculation of the derivative 
of gJi(/3), while Appendices B-E contain the proofs of some technical results. 
For these proofs, we use techniques which appear in [5] and [27) . and results on 
strong convergence of martingales. 

2 The Model Assumptions and Notation 

We first specify the matrix notation, that we employ in the present article (see 
[22]). If A is a p X 1 vector, we denote by ||A[| its Euclidean norm. If A is a 
p X p matrix, we denote with ||A|| — sup||_>^||^]^ l|AA|| its operator norm, and 
with |||j4||| = sup||_)j|i ]^ |A'^AA| its spectral radius. If A is symmetric, then 
|||A||| = ||A||. We denote by det(A) the determinant of A, and by tr(A) the 
trace of A. If A is a symmetric matrix, we denote by Amin(A) and Aniax(A) 
its minimum eigenvalue, respectively its maximum eigenvalue. For any matrix 
A, II A 11= {Amax(A-^A)}-^/^. We let A^/^ be the symmetric square root of a 
positive definite matrix A and A^^/^ = (A^/^)~^. Finally, we use the matrix 
notation A < B if B — A is non-negative definite, i.e. A"^AA < A^^BA for any 
p X 1 vector A. 

Finally, in this article, we denote by C a generic constant which does not 
depend on n and (3, but is different from case to to case. 

We now introduce the model assumptions and the estimating equation, which 
is the focus of investigation in the present article. 

For each i > 1, let y^ — {yn, . . . , yim)'^ , be the response variable of individual 
i, where ytj represents the response of individual i at time j, and to is a fixed 
time horizon, which is the same for all the individuals in the study. Clearly, 
the variables {yij)i<j<m display a non-trivial correlation structure, which, in 
the main application that we have in mind, is assumed to be the same for all 
individuals. 

As in a classical regression problem, each outcome variable y^ is thought to 
have been influenced by a set of explanatory variables, whose values are given 
by a p-dimcnsional vector Xy . The following example illustrates the complexity 
of such a study. 

One of our assumptions is that the explanatory variables x^j are non-random 
and the response variables {yi)i>i are defined on a common probability space 
(J7, .7-", P^). The uncertainty in this model is represented by the probability 
measure Pp, which depends on the unknown parameter /3 e T, where T is 
an open set in W^. This is a standard assumption in the theory of statistical 



inference. Another usual assumption encountered in the hterature is that the 

fact that the variables {yi)i>i are independent under Pp, for any value of /3 S T. 

Our model assumptions are: for each /3, and for any i < n, j < m, we have: 

Var/3(y,,|^,_i) = ^f^'i^J^p) -.^ <f>af^{0), 

where J-'i^i denotes the cr-ficld containing all the information about the variables 
2/i,...yi_i, /i is an arbitrary differentiable function with positive derivative, 
and Ep{-\Ti-i),V&Ti3{-\Ti-i) denote the conditional expectation, respectively 
the conditional variance with respect to Pfs. Here is a nuisance parameter; in 
what follows, we assume that (p — 1. 

Here are the most commonly used link functions /i: 

1. in the linear regression, /i(y) — y; 

2. in the log regression for count data, fi{y) — exp(y); 

3. in the logistic regression for binary data, /i(y) = exp(y)/[l + exp(j/)]; 

4. in the probit regression for binary data, /x(j/) = ^{y), where $ is the standard 
normal distribution function; we have $(j/) — (27r)~^/^ exp(— y^/2). 

By definition, {yij — fJ-ij{P))i>i is a martingale difference sequence, with 
respect to Pp, for any j < m. 

We let /Ui(/3) = (/Xji(/3), • ■ • ,MJm(/3))^: and Ai(/3) be the diagonal matrix 
with entries af^iP), ..., af^{(i). 

Let T/f [fi) be the conditional covariance matrix of y^ given J-i-i, with 
respect to Pp^ whose elements are: 

■^j.jfc(/3) •= ^/3[(y»j - ^J■^]{P))iy^k - Ai*fc(/3))|^*-i], i<j,k<m. 

In matrix notation, we write 

4'\p) = Epliy, - M^(/3))(y^ - ^i^Wf\^^-l]■ 

The matrix E^ (0) has non-random elements afi {(3) , . . . , u'^j^^ (/?) on the diago- 
nal, but possibly random elements off the diagonal. 

Some information about the dependence structure (with respect to P^) 
within the components of yi is contained in its conditional correlation matrix 

(c) 

Rj (/3) given J^i-\, whose elements are: 






TTtW (0\ ._ hi 



nL (/3) := ,;: . ;^, , l<],k<m. 



I:tt(=) ^«M <- 1 D,. „ o „v,^ :tt(=) 



Note that \ri L{P)\ < 1 P/3-a.s. and rj i,(/?) = 1 for ^W P- In matrix notation, 
J:['\/3) = A,(/3)1/2r(^)(;3)a,(;3)1/2. (iq) 

Since d Hij (0) / d /3^ — (T?(/3)x^-, which in matrix notation becomes: 






D.(/3):=^W^=A,(/3)X, 



where Xj — (x^i, . . . , Xi„j)^ is an m x p matrix. 

In the present article, we consider estimating equations of the form: 

n 

g:(/3):=5]D,(/3fV*(/3)-i(y,-Ai.(/3)) = 0, n > 1, (11) 

where 

v,r(/3) = Mpy/'nuWMPy^' (12) 

and {7^*(/3)}„ is a sequence of random matrices, which satisfy the following 
conditions: 

{A) TZniP) is positive-definite and continuously differentiable 
(B) the entries of 7?.*(/3) are jr,j-measurable, for all n > l,/3 G T. 



One may think of the matrix TZ*_i{l3) as an approximation of the conditional 
correlation matrix Rj (/3) , and hence of the matrix V* (/3) as an approximation 
of Sf '(/3), due to (Uni) and ini). The fact that we consider 7^■_l(/3), instead of 

TZ*{P), as an approximation for Rj (/?), guarantees that the function gJ^(/3) is 
a martingale. 

The family {gJi(/3)}n is a transform martingale. (This family is a martingale 
with respect to P/j, if the entries of TZ*_i{(3)~^{yi — fJ.i{P) are P/j-integrable.) 

We now present several examples of estimating equations of the form pi 



Example 2.1 The "working independence" estimating equations: 

n 

g]:''^^P):^Y.^J{y,-f,m), n>l, (13) 



4=1 



constitute a particular case of (|lip . with 7?.*_]^(/3) = I for all i > 1. 

Example 2.2 The "generalized estimating equations" (GEE) ([3]) studied in 
[27] can be written as: 

n 

S^^^m:=Y.^lA.,{(3y/^R.,{a)-'MP)-^/'iy,-fi,m, n>l. (14) 

i=l 

These equations are particular instances of PT|) . In this case, TZ*_i{f3) = Ri(a) 
for all i > 1, where Ri(a) are some non-random positive-definite matrices, 
depending on a parameter a. 

For the next two examples, we assume that the conditional correlation matrix 

(c) 

Rj (/3) is the same for all individuals, i.e. 

Rf V) - R^' V) , Vz > 1 , v/? G r . (15) 



Example 2.3 As in [50], let {/3„}„ be a sequence of consistent estimators of 
Po, defined as roots of gn^^^iP) — 0- (Under some conditions, one can prove 
that the sequence {/3„}„ exists; see e.g. [B] or Remark [4.161 Here Po is fixed 
and represents the true value of the parameter.) 

If ([15]) holds, then under the conditions of Theorem 1 of [2] (and using an 
argument similar to the one used in the proof of this theorem), one can show 
that the sequence {TZn}n defined by 

_ 1 " _ _ ^ ^ 

Tin := - VA,(/3„)-i/2(y, -^,(/3„))(y, -^,(/3„))^A,(/3„)-i/2, (16) 

n ^ — ' 

approximates the matrix R (/3o), i.e. 7^„ — R (/3o) -^ 0, element-wise, Pji^- 
a.s. The following pseudo-likelihood equations (PLE's) 

n 

g„(/3)-5]xfA,(/3)l/27^-_\A,(/3)-l/2(y^_^^(^))=0, n > 1, (17) 

4 = 1 

constitute a particular case of (fTTj) with TZ*_i{P) = TZi-i for all /3 G T. (Note 
that p?]) is different than equation (4) considered in f^, which contains 7?."^ in 
the middle, instead of TZ~\.) 

Example 2.4 Suppose that P^ holds, and there exist some constants Cp > Q 
and Sp > Q such that 

Ep\\MP)-'l\y, - ^i^mf^'' < Cf3, V* > 1. 
Using Lemma A.l of [2;, one can show that the sequence {7^,* (/3)}„, defined by 

1 " 
K(/3) := - VA,(/3)-i/2(y, -/x,(/3))(y, ™/x,(/3))^A,(/3)-l/^ /3 e r,n > 1, 
n ^ — ^ 

i—l 

(18) 

(c) (c) 

approximates the matrix R (/3), i.e. 7?.*„i(/3) — R (/3) ^ element- wise, Pp- 
a.s and in L^{Pp), for all /3 G T. The sequence {7?.*(/3)}„ satisfies conditions 
(A) and (B). Equation (fTTj) can be written as: 

n 

§:(/?) =^XfA,(/3)i/27e*_,(/3)-iA,(/3)-i/2(y,_^,(/3))=0, n > 1. (19) 

4=1 

(Similar estimating equations, which were not transform martingales were stud- 
ied in [H].) 

We consider the following sequence of estimating equations: 

n 

g„(/3) := ^D,(/3)^s(^)(/3)(y, -/.,(/3)) = 0, n > 1, 



which can be written as: 

n 

g„(/3) = 5]xf A,(/3)i/2Rf)(/3)-iA,(/3)-i/2(y, _ ^,(/3)) = 0, n > 1. 

i=l 

We have: 

M:{(3) := Cov^[g:(/3)]=^XfA,(/3)V2E;_i(/?)A,(/3)V2x, (20) 

n 

M„(/3) := Cov^[g„(/3)]=^XfA,(/3)i/2E,(/3)A,(/3)i/2x,. (21) 



4=1 



HereE;„i(/?) := i^^[7^*_l(/3)-lRf '(/3)7e*_i(/3)-i] andE,(/3) -.^ Ep[I{!l''\p)~^]. 

3 Optimal Estimating Equation 

Following the approach of [Tl|, we introduce a general class Tin of estimating 
functions (which accommodate our model) , and the concept of optimal estimat- 
ing equation in this class. As a preliminary step, we show that the estimating 
function g„(/3) is optimal within this class. The main result of this section 
identifies a set of conditions for the approximation matrices {7^* (/3)}„>i, un- 
der which the sequence {gJi(/3)}n>i of estimating equations is "asymptotically 
optimal" within {Ti.n}n>i- 

For each n > 1, we consider the following class of estimating functions: 

n 

-Hn = {q„(/3) = Y. C.(/3)(y. - M.(/3)), /? e T}, 



where Ci(/3) is a p x m random matrix, whose elements are J^i_i -measurable 
and continuously differentiable (with respect to /3), for alH > 1. Moreover, if 
Ci,uj{/3) denotes the (w, j)-element of Ci(/3), we assume that: for any /3 £ T, 
i > 1, 1 < u,v < p and 1 < j,k < m 



Ef3\ci^ujiP)\ < oo, Ep 






(y»j -/^y(/3)) 



< CX3, 



£;^[c,,„,(/?)z;gfc(/3)c,,,fe(/3)] < oo. 
For each function qn(/3) e Ti-n, we introduce the following matrix: 



f [q„(/3)] := Ep 



9q«(/3) 



df3^ 



{Coy p[cir,mr^Ep 



gqn(/3) 



Remark 3.1 Note that g„(/3) is an element of the class 7i„. Another element 
of Tin is the GEE function g„(/3) of f2T, given by ^. 



10 



Remark 3.2 The function g™'^''P(/3), given by (|13p is also an element of Tin- 
For this function, we have: 



Htr'^<=P(/3) := ^Ef> 



dg^^°T?{f3) 



^XJMP)X, 



n 

Mj,"dep(/?) :^ Cov^[gj,"deP(^)]^^xfA,(/3)i/2R,(/?)A,(/3)i/2X„ 

where R.i(/3) is the true (unconditional) correlation matrix of the i-th individual. 
The function g™'^°P(/3) can be viewed as a score function, in a model in which 
Ri(/3) = I. (Recall that a score function is the derivative of a log-likelihood 
function.) To see this, suppose that there exists a function a such that a' — fi. 
Then sl^-^^nP) = 5;„(/3)/9/3^, where UP) = Etl[P^^fy^ " ET=i «(4/3)]. 

Remark 3.3 Let g„(/3) be the GEE function, given by (HH). Let M„ := 
Cov^o[gn(/?o)] and H„ := £;^„[P„(/3o))], where P„(/3) = -ag„(/3)/9/3^. JHere 
/3o G ^ is fixed and represents the "true" value of the parameter.) If {/?«}„ is 
a sequence of weakly consistent estimators of /3o, then Theorem 4 of [Tf] says 
that 

M;^i/2h„(/3„_/3o)^7V(0,I), 

in distribution (under Pf3„). Therefore, in order to obtain an asymptotic confi- 
dence interval for /3o of minimal length, one needs to maximize (in the sense of 
the non-negative definiteness order) the matrix H„M^^H„ — £[g„(/3o)]. This 
gives a first motivation for Definition 13.51 



Remark 3.4 The matrix £[q„(/3)] can be viewed as a generalization of Fisher 
information matrix. To see this, recall that if s„(/3) is a score function in the 
class Tin, then f [s„(/?)] coincides with Fisher information matrix: 



f [s„(/3)] = Cov^ [s„(/3)] = ~Ep 



dsn{(3) 



By the Cramer-Rao inequality (see e.g. Theorem 7.3.10, [3 ), the best unbiased 
estimator W = W(yi, . . . ,y„) of /? is the one which attains the Cramer- Rao 
lower bound, i.e. for which Cov^[M^] = {£[s„(/3)]}~^. Among those estimators, 
the one with minimum variance is the one for which Fisher information matrix 
is maximal. This provides another motivation for Definition 13.51 



We are now ready to introduce the concept of optimal estimating function 
in the class W„ (see Definition 2.1, [Hj). 

Definition 3.5 We say that an estimating function (IniP) ^ ^n ^^ optimal (or 
quasi-scorej within the class Tin, if for any qra(/3) G W„ and for any /3 G T 

f[q*(/3)] — £[q„(/3)] is nonnegative- definite. 
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The following result lies at the origin of our developments. 

Proposition 3.6 The function g„(/3) is a quasi-score within the class Tin- 

Proof: Using Theorem 2.1, [12], it suffices to show that for any q„ G 7i„, 



i^/3[qn(/3)l„(/3)' 



^Eb 



9q„(/3) 



op^ 



V/3er. 



(22) 



First, we treat the left-hand side of dH]). Note that g „(/3) = J2"=i C^{f3)e^{|3), 
where 

Q(/3) :^ Xf A,(/3)i/2Rf)(/3)-iA,(/3)-V2 = xf A,(/3)sf )(/3)-i. 
Using the fact that £^/3(yi|^i-i) = {J'i{P), we obtain: 

n 
n 

= ^i?^[C,(/3)A,(/3)X,]. (23) 

Next, we treat the right-hand side of (|22p . If we denote by Cii(/?), • . . ,Cim(/3) 
the columns of Ci(/3), then q„(/3) = X]r=i Sjli Cii(/3)(2/y - t^ijiP))- Using the 
chain rule, the fact that c^ (/3) is ^i_i -measurable, and Efj[yij\Ti-i) ~ iiij{(3), 
we have: 



£;« 



5q„(/3) 



d(5^ 






5c„(/3) 



5c., (/3) 



^, 



■/3 



a/3' 



Ep[c,,{P)al{P)^l]] 

n rn 

= -^^£;/3[c.,(/3)4(/3)x5.] 

4=1 J = l 

n 

= -^ii;^[C,;(/3)A,(/3)X,]. 



T ^/3(2/.j -My(/^))|-^j-l) 



Cy(/3) 



3m»j (/?) 



9/?^ 



(24) 



Relation (121) follows from (1231) and 



D 



Remark 3.7 By taking q„(/3) = g„(/3) in ([22| . we see that g„(/3) has the 
property of a score function: 



H„(/3) := -Ep 



AP) 



9/3^ 



Cov;3[g„(/3)]=M„(/3). 



(25) 
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Also, by taking q„(/3) = g^(/3) in dH]), we obtain that 

"5g;(/3) 



H:(/3) := -Ep 



dpT 



= ^Xf A.(/3)1/2e*_i(/3)A.(/?)^/^X., (26) 

1=1 

where E*_;^(/3) := _E/3[7?.*_i(/3)^-^]. From here, we conclude that: 

f[g„(/3)]=M„(/3) and f [g:(/3)] = H:(/3)M:(/3)-^H:(/3). (27) 

We note that the optimal function g„(/3) depends on the unknown condi- 

tional correlation matrix Rj (/3), and therefore, cannot be used in practice. In 
the remaining part of this section, we circumvent this difficulty by replacing it 
with a consistent estimator TZ*_i{/3), proving that this procedure preserves the 
optimality of the equation, in the asymptotic sense. 

As in J12j , we consider now the "normalized" estimating function: 



q(r°™)(/3):= ii;; 



aq„(/3)^^~' 



qn(/3), 



d(3T 
for any q„(/3) G Tin. Note that the covariance matrix of q„ """-^ (/?) is: 

Z[q„(/3)] := £;/3[qi"°™)(/3)qi"°™H/3)''] = {^[qn(/3)]}"' • (28) 

A sequence {q*i(/3)}n>i of estimating functions is asymptotically optimal, 
within the collection {7i„}„>i, if the corresponding matrix X[q*(/3)] is minimal 
(in the sense of the non- negative definiteness order), when n is large enough. 

More precisely, we have the following definition: (see Definition 5.1, 12J) 

Definition 3.8 Let {q*i(/3)}n>i be a sequence of estimating functions such that 
q*(/3) G Hn for all n > 1. We say that {q^ (/?)}«>! is asymptotically opti- 
mal (or asymptotic quasi-scorej within the collection {7i„}„>i, if for any 
sequence {qn(/3)}ri>i with qn(/3) G Tin for all n > 1, and for any f3 £ T , 

{X*(/3)~ ' Xn{(3)T*^{f3)^ ' — !}„>! is asymptotically non-negative definite, 
in the sense that, and for any p x 1 vector A with \\X\\ = 1, 

hminf A^[2-:(/3)-i/22-„(/?)2-:(/3)-i/2 _ i]A > 0. 

n — 'oc 

Here i:^{p) ^ I[qUP)] anrf 2:„(/3) = J[q„(/3)]. 

(See Remark 5.3 of [12) for a motivation of the previous definition, and the proof 
of Proposition 5.4 of [12] for the rigorous meaning of the concept of "asymptotic 
non-negative definiteness" introduced above.) 

Similarly to [5], we introduce the following assumption: 

(c) 

(H) there exists a constant Cp > such that Aniin[R„ (/3)] > Cp,\/n > 1, 
P^-almost surely, for all /? G T. 
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(see condition {H') on p. 528 of [5]) 

The following theorem is the main result of this section. 

Theorem 3.9 Suppose that assumption (H) holds and 

Xnunin'^^'^m] ^ «), for all (3eT. (29) 

Let {7?,* (/3)}„>i be a sequence of random matrices which satisfy conditions (A), 
(B), as well as the following conditions: 

(c) 

(C) 7^*_]^(/?) — R„ (/3) -^ (element-wise), in probability Pp, for all 13 ^T 
(R) there exists a constant Kp > such that Amin['^*i(/3)] > Kp for all n > 1, 
Pp-almost surely, for all /3 G T. 

Then, the sequence {SniP)}n>i is an asymptotic quasi-score within the collection 

Proof: By Proposition l3 . 61 and Remark 5.2, [12^, the sequence {g„(/3)}ra>i is an 
asymptotic quasi-score within the collection {'Hn}n>i- By invoking Proposition 
5.5, [12], it suffices to show that the sequences {g^(/3)}ri>i and {g„ (/?)}«>! are 
"asymptotically equivalent" , in the sense that 



dct T[g„(/3)] 
detX[g,*(/3)] 

By (pS)) . this is equivalent to: 

det £[gim 
det 5[g„(/3)] 

which in turn, by (j27p . is equivalent to: 

det H;(/3)2 



1, V/3 e r. 



1, V/3GT, 



1, V/? e r. 



det [M„(/3)M;(/3)] 
Therefore, the proof will be complete, once we show that for any /3 G T 

^^IBM^I and ^^^^M^,. (30) 

detM„(/3) detM„(/3) 

Recalhng the definitions (EOl), (ED) and ([211) of M;(/3), M„(/3) and H;(/3), 

we see that to prove (1501) , it suffices to compare Ej(/?) := -E'/3[R^ (/^)~^] with 

Eti(/3) = il;^[7^*_l (/?)-!] and e;_i(/3) = i?0[7^*_l(/3)-lRf \/?)7e*_i(/3)-i]. 
Using (iJ), (C) and [R), we claim that: (see below) 

E,(/3)-iE*_i(/3) ^ I (elementwise) (31) 

E,(/3)-iE-_i(/3) ^ I (elementwise). (32) 
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We now proceed with the proof of the first convergence in (PO)) . using ([?T|) : 
the second convergence follows by a similar argument, using ((32)) . Let e G (0, 1) 
be arbitrary. By (pij) . there exists an integer no (depending on e and /3), such 
that 

1 - e < A„,i„[E,(/3)-iE*_i(/3)] < A,„ax[E,(/3)-^E*_i(/3)] < 1 + e, Vi > no- 

Therefore, 

(1 - £)M„„,„(/3) < H:„_„(/3) < (1 + e)M„„,„(/3), Vn > no, (33) 

where H;„,„(/3) :=_Er=„„ Xf A,(/?)1/2e*_,(/3)A.(/3)V2x, and M„„,„(/3) := 
Er=no Xf A,(/3)i/2E,(/3)Ai(/3)i/2X,. Using the fact that the determinant is a 
non-decreasing function (with respect to the non- negative definiteness order), 
we obtain: 

il^eY<f^^^<[l + ef, yn>no. (34) 

det M„o,„(/3) 

Since E,_i(/?) > m'^I, it follows that M„(/3) > m-^Ii\f''P{(3). By ([29l). 
Amin[M„(/3)] ^ oo, and therefore Amin[Mno,n(/3)] ^ cxd as n — > cxo. Hence, there 
exists an integer ni > no (depending on s and /?) such that Ainin[M„Q „(/3)] > 
e-iAmax[M„„_i(/3)] for all n > ni. Therefore, M„„_i(/3) < £M„o,„(/3') for all 
n > ni, and 

M„„,„(/3) < M„(/3) < (1 + e)M„„^„(/3), Vn > m. 

From here, we conclude that: 

det M„„,„(/3) < det M„(/3) < (1+ £)Pdet M„„,„(/3), Vn > m. (35) 

This argument can be repeated for H* (/?), since Ainin[H*^ niP)] ^ oo as 
n —> oo (this is a consequence of p3p '). We conclude that there exists an integer 
n2 > ni (depending on s and /5) such that 

det H:„,„(/3) < det H;(/3) < (1 + £)Pdet H;„,„(/?), Vn > n2. (36) 

From (l35l) and ([36)) . we obtain: 



1 det H;,J/3) ^ detg(/3) det H;,J/^) ^^ ^ 

(1 + s)P det M„„,„(/3) - det M„(/3) " ' Wet M„„,„(/3) 



Finally, using (|M|) and ([57)l . we obtain: 



(37) 



^y<?^^<(l + ^)^^ Vn>n.. 
1 + ey ~ det H*(/3) " ^ ^ ' " 



This concludes the proof of ([30] 
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We now turn to the proof of (pij) . Suppose, by contradiction, that there 
exist some Bq > and a subsequence (i„)„ for which 

||E,,.(/3)-iE*^_i(/3)-I||>eo, Vn > 1. (38) 

Under condition (C), there exists a subsequence of (i„)„, which we denote 

(c) 

by {ln)n, for which 7?.;* _i{(3) — Rj^ (/3) -^ (element- wise) , Pp-a.s. Therefore, 

7^;^_i(/3)"^-R|f (/3)"i ^ (element-wise) P^g-a.s. By conditions (R) and (H), 

each of the elements of the matrices 7?.;* _i(/3)~^ and R;^ (/3)"^ are bounded 
above by the constants K^^, respectively CJ^, P^-a.s., and hence this last 
convergence holds in i^(P^) as well, i.e. 

Er„-i(/3) -E,„(/3) ^0 (element-wise). (39) 

(Clearly, the element-wise convergence is equivalent to the convergence in norm.) 

(c) 

Note that R; {(3) is an m x m positive-definite matrix, whose elements are 



bounded above by 1, P^-a.s. Hence, R; (/3) < ml, P^-a.s. and R; (/3) > 



m ^I, Pg-a.s. From here, we conclude that 



E,„(/3) = E^[Ii\l\l3)-'] > m-% i.e. A„,in[E,„(/3)] > m-\ 
Therefore, 

l|E/„(/3)-i|| = A,nax[E,„(/3)-i] = ^-— < m. (40) 

Using dSS]) and ([40l), we obtain: 

||E,„(/3)-iE*_i(/3)-I||<||E,„(/3)-i||.||E*_i(/3)-E,„(/3)||^0. (41) 



Comparing ((38)) and ((4T|) . we arrive at a contradiction. 
The proof of (|32p is very similar and is omitted. D 



(c) (c) 

Remark 3.10 Assume that R„ (/3) = R (/?) for all ri. In this case, condition 
(C) is satisfied by the sequence {7^*(/3)}„ given in Example 12.41 By Theo- 
rem 13.91 the sequence (jlOp of estimating equations is an asymptotic quasi-score 
within the collection {H„}„. 

4 Asymptotic Existence and Strong Consistency 

In this section, we fix a value /3o G ^, which we regard as the "true" (but 
unknown) value of the parameter (3. Our aim is to give some sufficient conditions 
for the existence of a sequence of estimators /3„ , defined as solutions of (fTTj) , such 
that {Pn}n converges to /3o a.s. These conditions are slightly weaker than the 
conditions for the asymptotic optimality of the sequence {gj^ (/?)}«, encountered 
in Theorem 13.91 In particular, we may allow the matrix 7^*(/?) not to depend 
on /?. 
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Remark 4.1 In the present section, the a.s. statements refer to the probabihty 
Ppg. Moreover, we employ the usual convention of omitting the argument (3q in 

g;(/3o),M„(/3o),H„(/3o),etc. 

Recall that a sequence {s„}„>i of p-dimensional random vectors with com- 
ponents s„ ~ {sn , ■ • ■ , Sn ), is Called a p-dimensional martingale if 

ii;(sW|si,...,s„_i) = si"2i, Vfce {!,..., p}, Vn>l. 

Let Br = BriPo) = {/3 e T; ||/3 - /3o|| < r} and dB^ = {/3; ||/3 - /3o|| = r}. 

We begin with a general result, which is similar to Theorem 7 of [57]. By 
way of comparison, we note that our result is formulated within a martingale 
context and that we do not require that the matrix Vn be positive definite. 

Theorem 4.2 Let {q„(/3),/3 G T}„>i he a sequence of p-dimensional random, 
functions, such that each q„(/3) is continuously differentiable with: 

and {qn}n>i is a p-dimensional martingale with mean zero and M„ — Cov[q„]. 
Let {a„}„ be a sequence of constants such that, for some C > and N > 1, 

an > CA„,ax(M„), Vn > N. (42) 

Assume that the following conditions hold: 

(/) A,„in[M„]^00 

(S) there exist some constants S > 0,cq > such that, with probability Pfj^ 
equal to 1, there exists some random numbers ri > 0, rii > 1 for which 
(i) \\^Vn{P)X\ > for all X, ||A|| = 1, and for all {3 £ Br^,n > m; 
(li) limlimsupa~i/2-^ sup |||P„(/3) - I?„||j = 0; 

{Hi) |A PnA| > coa,/ for all A, ||A|| = 1, and for all n > ni. 

Then, there exists a sequence {/3„}„ C T and a random number uq such that: 

(a) P(q„(/3„) = 0, for all n > uq) ^ 1; 

(b) /3„ -> /3o a.s. 

Remark 4.3 Condition (S)(ii) says that, with probability PjSg equal to 1, the 
sequence {a„ 2?„(/3)}„>i is equicontinuous at (3o. 

The proof of Theorem 14.21 combines analytic and stochastic techniques. On 
the analytic side, we have a result from topology, which provides an ingenious 
method for proving the existence of the solution of the one-to-one continuously 
differentiable function q„(/3). Breaking down condition (S) into components has 
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the advantage of allowing us to formulate some sufficient conditions for strong 
consistency of estimators, in terms of conditions on the matrices and functions 
that define our estimating equations (see Theorem 14. 13p . On the other hand, 
condition (I) enables us to apply the strong law of large numbers (SLLN) to the 
martingale {qn}n>i- 

For the sake of completeness, we state some auxiliary results below. 

Lemma 4.4 Let T : T C M^ ^ M^ be a one-to-one continuously differentiable 
function and j3q ^ T such that Br{f3o) C T. // 

||T(/3o)||< inf ||T(/3)-r(/3o)||, 

pEaBr 

then there exists a (unique) j3 & B^ such that T{j3) — 0. 

(This result is a consequence of Lemma A of [4 .) 

Lemma 4.5 (Martingale SLLN) Let{s„}„>i be a p- dimensional martingale 
with mean zero and covariance matrix M„. //Ainin(M„) -^ oo, then 

a.s. V(5 > 0. 



[A„.ax(M„)] 1/2+5 

(This result can be proved component- wise, using Theorem 3 of [TF with p = \.) 

Proof of Theorem 14. 2t Let Q,i be the event on which condition {S) holds, 
with Pi3^{D,i) = 1. For each uj E^i and n > rii, we consider the function 

T„ (/?):= a-i/2-^q„(/3), /3eB,,. 

This function is continuously differentiable; it is also one-to-one, since T>n{(3) is 
non-singular for all (3 S 5^ • 

We prove that there exists an event VLq C Vti with P^q(17o) = 1, such that 
for every oj G 51o a-nd for any e > 0, there exist some random numbers r^ G 
(0,e),r£ < ?'i and n^ > ni such that 

||T„(/?o)|| < mf ||T„(/3)-T„(/3o)||, Vn > n,. (43) 

l3edBr^ 

We claim that the conclusion of the theorem follows from here. To see this, 
note that by Lemma l4^ on the event fioi for any e > 0, there exists I3n,e G Br^ 
such that Tn{Pn,e) = for all n > n^. Let Eq > be fixed and denote ro = r^^ 
and no = n^^. We define /3„ := Pn,eo, for all n > uq. Clearly, on the event Oq, 
Tn{$n) = for all n > uq, i.e. (a) holds. If e > is arbitrary, then for any 
n > n'^ :— max{n£,no}, both /3„ and /3„^e are zeros of the function r„, in the 
ball Br' with radius r'^ — min{r£,ro}. Since r„ is one-to-one, we conclude that 
0n,e = Pn for all n > u'^. This argument shows that /3„.e does not depend on 
£, if n is large enough. Finally, part (b) of the conclusion follows, since on the 
event Uq, \\$n — 0o\\ < r^ < e for all n > n^. 
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We now turn to the proof of (|^ . We first treat the right-hand side. Let 
a; G r^i be fixed. By condition {S){ii), for any £ > 0, there exist some random 
numbers Ve G (0, e),r^ < ri and N^ > ni, such that for all n > N^,/3 € B^.^, 

and hence, for any vector A, ||A|| — 1, we have an |A^[P„(/3) — 'Dn]X\ < £■ 

In particular, it follows that for all n > iVe,/3 £ Br^, 

a-i/2-*-|A^P„(/3)A| > a-i/2-5|Ai^p„A| - e. (44) 

Let n > N^ and /3 e dBr^ be arbitrary. Using Taylor's formula, there exists 
Pn G Br, such that q„(/3) -q„ = -I?„(AO(/3-/3o). By letting A = (/3-/?o)/re, 
we obtain: 

l|q«(/3)-q„f = (/3-/3or2?„(^„fP„(^„)(/3-/3o) 

= \^Vn{PnfVn{Pn)Xr^e > [A^P„(/3„)A]2r2, 

where for the last inequality, we used the fact that X^C^CX > (A^CA)^ for 
any matrix C and for any vector A, with ||A|| = 1 (Lemma 1, }27|). Taking the 
square-root, multiplying by a„ , and using (|44p and {S){iii), we obtain: 

l|T„(/3)-T„(/3o)l| > a-^/^-'\X^VnWn)X\re>{a-^^^~'\X'^VnX\-e}r, 
> {co-£)re- 

Hence, 

inf ||T„(/3)-r„(/3o)||>(co-eK, Vn > iV,. (45) 

We now treat the left-hand of (|43|) . By Lemma [4751 (and using (J42l) and (I)), 

rn(/?o)ll=a^'/'"iqn||^0, a.s. (46) 

Denote by ^2 the event where (|46l) holds. Let fip = fli riil2 and fix tj e J7o- 
For any e > 0, let r^, N^ be as above. By p6|) . there exists n^ > iY^ such that 



||r„(/3o)||<(co-£)r„ Vn>n,. (47) 

Relation (gS]) follows from (gH]) and (gT]). D 

In what follows, we will apply the previous result to the case of the estimating 
function g*(/3). As in [57], we have: 






KW ■■= -4^ = H„(/3) ~ B„(/3) - 5„(/3), /3 e r 



(see Appendix A for the exact formulas of H„(/3), B„(/3), £■„(/?)). 
We define the following constants: 



^(o),i„dcp ^ x5.(HJ^^'i<=P)-ix,, 



A CrrindopN (0),indcp 
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Suppose that /i is three times continuously difFerentiable. For any r > and 
n > 1, we let 



7r„(r 



sup max 



m"(x?:/3) 



M'(x^/3) 



sup max 






1/2 



r) = sup max 

l3eBr i<n,j<m 



m"'(x?:/3) 



M'(x?:/9) 



supmaxA,„ax[(7^*_l)^/^7^*_l(/3)-l(7^*_l)l/2] 
fieBy *^" 

supmaxA,„ax[(7^*_l)^/'7^*_l(/3)-l(7^*_l)l/2 

/3e-Br *^" 



sup max An 

l3eBri<^';l<P 



_d_ 

m 



K-i(/3) 



As in j27j , we introduce the following assumption: 

(AH) there exists C > 0, ro > such that fc^ (ro) < C for aU n > 1, ^ = 2, 3. 

Note that {AH) holds if the covariates are bounded. 
We introduce a new assumption: 

(K) lim lim sup ra„' =0. 

We have the following result, whose proof is given in Appendix B. 

Lemma 4.6 Under (AH) and (K), there exist 7'i > and ni > I such that 

Vnif) < Cral/'^, for all r £ {0,ri),n > rii. 

In particular, under (AH) and (K), linir^o linisup^^^g^ ?7„(r) = 0. 

We consider the following condition on the sequence {??.* (/?)}„: 

(i?') there exists C > such that Ainin(''^^) > C for all n > 1 a.s. 

Clearly, condition {R') is weaker than condition (R) (encountered in Theorem 
The following fact is an immediate consequence of condition (i?'): 



Amax[7^*_l(/3)-l] < C7r„(r), V/3 e B„Vi < 



n, a.s. 



(48) 



Remark 4.7 Suppose that with probability Pp^ equal to 1, the sequence {??,* (/3)}„>i 
is equicontinuous at /?o, i.e. 



lim limsup sup ||7^* (/3) — 7^* || = a.s. 



n^oo fieBr 



(49) 



If the sequence {??.*(/?)}„ satisfies {R'), then hmr^olinisup„_oo p„(r) = a.s. 
and lim,.^o lim sup„^^ 7r„ (r) = 1 a.s. 
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Example 4.8 Assume that ([T5|) holds. Let {Ti-n{P)}n be the sequence intro- 
duced in Example 12.41 Using the same argument as in the proof of Proposition 
2, [2], one can show that if: 

(i) limr^olinisup„^oo77„(r) = 0, 

(h) A„ax(HJJ^dcp) < ^„ fo^ all „ > ;^^ 

(iii) E\\A;^^^e,f+^ < C for all i > 1, for some d>0 
then (111) holds. 



Let {a„}„ be a non-decreasing sequence of constants with lim„ a„ = cxd, and 

The following three lemmas examine the asymptotic behavior of the three 
terms of X'*(/3). Their respective proofs are given in Appendices C, D and E 
(see also [T]). 

Lemma 4.9 Suppose that (AH) and (K) hold. Let {7?.* (/3)}„ be a sequence of 
random matrices which satisfy (A), (B) and (R'). If 

(C'l) lim limsup5„7''n(r)?7„(r) — a.s. 

^ *^ n — >oo 

(C2) lim limsup5„p„(r) — a.s. 

^ ^0 n — s-cx) 



then 



limlimsupa;^^/^"'' sup |||H„(/3) - H„||| = a.s. 



Lemma 4.10 Suppose that (AH) and (K) hold. Let {'R-n{P)}n be a sequence 
of random, matrices which satisfy [A), (B) and {R'). If 



{Ci) lim limsupr5„7r„(r)a,/ = a.s. 

^ ^0 ^ VOQ 



(C3) lim limsupr5„7r^(r)g„(r) — a.s., 

then 

lim limsupa^^''^^ sup |||B„(/3)||| == a.s. 

Lemma 4.11 Suppose that (AH) and (K) hold. Let {7?.*(/3)}„ be a sequence 
of random matrices which satisfy (A), (B) and {R'). If 

(C4) limsupn_E[7r^(r)]a„Aniax(HJJ^ '^p) < 00, where a„ = max{a„,a„} 

n — >oo 

(C5) limsupn^[7r4(r)g2(r)]A,nax(H;';d^P) < ^ for all r > 0, 

n — ^00 

then 

lim a-^/'^-^ sup |||£^„(/3)ll| =0 a.s. 
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Remark 4.12 1. In the case of the linear regression model, fi{x) = x for all x. 
Hence rin{r) = for all r > 0, n > 1, and (C() is automatically satisfied. In this 
case A, = I and H^f^cp = j^ti ^f X,. 

2. Lemma [4.61 shows that condition (Ci) is stronger that {C[). 

3. If Ti-niP) does not depend on /3, then p„(r) = (?„(r) = for all r > 0,n > 
1; hence, (C2), (C3) and (C5) are satisfied. In particular, this is the case of 
Examples O [221 and [231 



As in |2j, we introduce the following assumption: 

(H') there exists a constant C > such that Aniin(R-„ ) > C, V?i > 1, a.s. 

Here is the main result of this section. 

Theorem 4.13 Suppose that (H'), (AH) and (K) hold. Let {7^;(/3)}„ be a 
sequence of random matrices which satisfy {A), (B), (R') and 

(E) there exists a constant C > such that Xina.x{T^n) ^ ^1 '^'^ — ^^ ^■^■ 
Suppose that conditions (Ci)-(C5) are satisfied with a„ — Aniax(HJf'^°P). // 

ft) A,„i„(H;fdcp) ^ ^ 

(ii) there exist an integer N > 1 and some constants (5 > 0, cq > such that 
A,„i„(HJ^<i°P) > co[A,„ax(H;^'i°P)]l/2+^ Vn > N, 

then there exists a sequence {/3„},i C T and a random number uq such that: 

(a) P(g;(/3„) = 0, for all n > no) ^ 1; 

(b) Pn -^ /3o a.s. 



Remark 4.14 Hypothesis (i) and (ii) of Theorem 14.131 are indeed very mild. 
To see this, consider the following stronger form of hypothesis (ii): 

ill)' A„,in[HJ^'^^P(/?)] > co[A„,ax(HJ^''°P)]i/2+^ /3 e r,Vn > TV. 

Using the approach of 6^, one can prove that under (i) and (ii)' , there exists 
a sequence of strongly consistent estimators, defined as roots of the "working 
independence" equation g,Jf^''P(/3) = (see Remark [ 



Remark 4.15 (Discussion of hypothesis (i) and (ii) of Theorem 14. 13p In the 
case of the usual regression models, conditions (i) and {ii) of Theorcm l4.13l can 
be simplified into conditions which speak only about the asymptotic behavior 
of the covariates (xij)i>i for j = 1, . . . ,m. 

1. In the case of the linear regression model, /i(x) — x and H"''^''p = 
I]r=iS^i^ii^S"- Hypothesis (i) holds if J2i>i ^mini^i^^lj) = 00 for some 
j, whereas (ii) holds if there exist an integer A^ > 1 and some constants 
5 > 0, Co > such that 



n m 


n m 


^^A„iin(XjjX^.) > Co 


y , / , Aniax(XyXj,) 


1=1 j=l 


i=l j = l 



1/2+5 



(50) 
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If the covariates (xij)i>i are bounded, then ([50]) holds if there exists some 
j = I, . . . ,m such that J27=i ^mini^ij^fj) > cqu^^^^^ for all n> N. 

2. In the case of the logistic regression model, ^{x) = exp(a;)/[l + exp(x)] 
and Hj,"dcp = Y^^=l E7=i exp{x5./3o}/(l + explx^/^oD^x^.x^.. Assuming that 
the parameter set T is bounded by a constant C > 0, and letting aij = 
exp{-C||xij||}/(l + exp{C||x^||}), we see that 



ix:E«..x,x^<Hjr'^-< 



j=i i=i 



1=1 j=i 



x'^ 



Hypothesis (i) holds if X]i>i aij'^min(xyX?J) = oo for some j, whereas {ii) holds 
if there exist an integer N >1 and some constants (5 > 0, cq > such that 

1/2+5 



EE"y^™'i(^y^S") > Co 

i=l i=i 



EEa 

i=i j=i 



max V^^zj ^^ij 



X„- ) 



Vn> iV. 



(51) 



If the covariates (xij)i>i are bounded, then ([?T|) holds if there exists some 
j = 1,...,TO such that X)r=i "u'^min(xijX^.) > con^/^+'' for all n > N. (See 
also [7] for a related analysis, in the case m = I.) 

3. In the case of the Poisson regression model, fi{x) = exp(a;) and H"^'^°p = 
J27^i J2T=i 6xp{x^/3o}xijX^-. Assume that the parameter set T is bounded by 
a constant C > 0, and let 6.^ — exp{C||x^ ||}). Then 

n m ^ n m 

EE 7^-^.4 <H-'^-<EE^^.-^.-S- 



i=i j=i 



Hypothesis (z) holds if ^^>^ Aniin(xijX^)/6,y = oo for some j, whereas (m) holds 
if there exist an integer A^ > 1 and some constants i5 > 0, cq > such that 

1/2+5 



E E ^^min(x»jX^) > Co 
i=l j"=l '■' 



/ , / , bijAma.x('X.ijX.^j ) 
i=l 1 = 1 



Vn > N. (52) 



If the covariates (xij)i>i are bounded, then (l52|) holds if there exists some 
j = 1, . . . , TO such that X^iLi '^min(xijX^) > con^/^+'^ for all n> N. 



Proof of Theorem 14. 13t Wc will apply Theorem l4.2l to the function qn(/3) = 
g;(/3), by taking a„ = A„,ax(Hjfdcp). Due to {R'), E\g*J < oo, for any n. 



1/2^* 



1/2-, 



Hence, {g,*},i is a martingale. Recall that M* = X]"=i ^i^i ^i-i-^i ^ 



where E, 



£;[(7^*_l)-lRf^(7^*_l)-l] (see m 



We first prove that 



holds. Using {R') and the fact that Rj < toI 
for aU i > 1 a.s, it follows that {n*_^)-^R!f\TZ*_^)-^ < CI for aU i > 1, a.s. 
Hence E.^ < CI for alH > 1 and 
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Hence Amax(M*) < CAmax(HJ,"'^<^P), i.e. a„ satisfies relation (|i^ . 

We now prove that (/) holds. For any px 1 vector A with ||A|| = 1, wc have: 



n 



i=l 



Y^)^X^Al^'(nUr'W(n*^ir'Al/'X.^ > minA„,„(Rr). 



minA„,in[(7^*_l)~'] ' A^HJ^^'^<=PA > CqA^HJ^^'^'^pA, 

using {H') and (i?). Taking the expectation (with respect to Pf3„), we conclude 
that A^M; A > CoA^Hj;'d'=PA, i.e. 

M; > CoHJ^'^'^P, Vn > 1. 

The fact that (/) holds follows from our hypothesis (i). 

We now prove that (5*) holds. Part (ii) follows directly from Lemmas 14.91 
14.101 and 14.111 To prove that parts (?) and {Hi) hold, note that condition {E) 
and our hypothesis (ii) imply that for any p x 1 vector A with ||A|| = 1: 

n 

A^H„A - V A^Xf Ay'(7^*_l)-'Ay'X,A > minA,„in[(7^*_l)-l] • A^HJJ^'^<=pA 

^ — ^ i<n 

1=1 ~ 

> CA^HJ;>^'=PA > Cal/^+^, Vn > N. 

From Lemmas l4.9[ 14.101 14. 11[ it follows that, with probability 1, there exist 
some random numbers ri > 0, rii > 1 such that 

a-i/'-'|A^[H„(/3) -H„]A + A^[B„(/3) +f„(/3)]A| < C/2, V/3 e Br„n> m. 

Recalling that 2?„(/3) = H„(/3) — B„(/3) — i?„(/3), we conclude that: 

|A^P„(/3)A| > |A^H„A|-|A^[H„(/3)-H„]A + A^[B„(/3)+f„(/3)]A| 
> Cal/^+^ - Cal^^+^/2 > 0, for aU /? £ B^, , n > m. 

This concludes the proof of parts (i) and {in) of {S). D 

Remark 4.16 The proof of Theorem 14.131 can be adapted to apply to the 
sequence {slf^^^ {(3)} n>i (in fact, only the proof of Lemma 14.91 needs to be 
adapted, since 9gj;>d'=P(/3)/a/3^ = HJ;''i'=P (/?)). More precisely, assume that (H') 
holds and 

{K') lim lim sup ?7„ (r) = 0. 

Under conditions (i) and {ii) of Theorem 14. 131 one can prove that there exist a 
sequence {/3„}„ and a random integer no such that 

P(gj,"'^<=P(/3„)=0,Vn>no) = l and /3„ ^ /3o a.s 

Note that {K') holds if the covariates are bounded, or we have a linear regression 
model. Our set-up covers a more general situation than Theorem 2 of [6]; in 
our case, neither the joint distribution of the data nor the covariance matrices 
are known. 
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Remark 4.17 Suppose that 

a„ = A^ax(HJ,"'i<=P). 

Then Sn = [Amax(HJ,"d'=P)]i/2-'5 and conditions (Ci)-(C3) become: 

(Cn UmHmsupr7r„(r)(7(")^-<i°P)i/2[A„,,,(HJ,"d^P)]i-*' = a.s 

(C2*) Km Umsupp„(r)[A„,ax(HJJ'^"P)]i/2-^ ^ a.s 

(C3*) Um Umsupr7r2(r)g„(r)[A^ax(H;r''°P)]i/2-^ = a.s. 

^ ^^ n — ^oo 

Remark 4.18 (Discussion of the assumptions on {7?,* (/3)}„ in Theorem 14. 13p 
In the case of Examples 12. 2112. 41 the assumptions imposed in Theorem 14.131 on 
the sequence {7^* (/?)}„ can be summarized as fohows: 



Example 



2:21 (GEE) 
(PLE) 
ll(AQS) 



Assumptions 



(CD, iC), {R'), (E) 
(CD, (C4), [R'), (E) 

{ci),{c^)Ac*,),{Ci),{c,)AR')AE) 

We note that assumption {E) is implied by: 

(C) Ti-n-i ^ ^n ^ (element-wise) a.s., 

which is satisfied by the sequences {7?.* (/?)}„ given in Examples 12.31 and 12.41 

Remark 4.19 (Weak consistency and asymptotic normality) Let /3o be the 
true value of the parameter, P;(/3) = d^llp)/d[f, M*, = M;(/3o), and H; = 
H* (/3o). Using a methodology similar to \2\ and [27j, if we let 

< = mmaxA,nax[(7^*_l)-l], ^ = A,„ax[(M:)-iH:], 

and B;(r) = {/3 e T; ||(H;)i/2(/3„ -/3o)|| < (T*)i/2r}, then under the condition 
that {d^T^}n is bounded, and 

{CC*) sup ||(H:)-V2p*(;3)(h:J-i/2_i||^o Vr > 0, 

one can prove that there exists a sequence {/3„}„ of weakly consistent estimators 
of /3o, such that P/3o(g* (/3„) = 0) ^ 1 and 

(M:)-i/2g:(/3o) = (m:)-i/2h:(/3„ - /3o) + op^^ (i). 

By an invocation of a martingale central limit theorem, under the appropri- 
ate conditions, one can conclude that (M*)~^/^g*(/3o) -^ A^(0,I), and therefore 



(m;)-i/2h;(/3„-/3o)^a^(o,i). 
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In view of ([51]) and ([5^. the matrices M* and H* are asymptotically the same, 
and hence 

(H*Ji/2(/3„_/3o)^7V(0,I). 

The matrix H* depends on the unknown parameter /3o; it also depends on the 
matrix 7?.*_;^(/3o) through the value E*_;^(/3o) = Epg[TZ*_j^{po)~'^], which cannot 
be calculated from the data. As suggested by Remark 8, [27], in practice, one 
may approximate the matrix H* by the matrix 

n 

H„ := ^Xf A,(/3„)i/27e*_i(/3„)-iA,(/3„)i/2x„ 

i=l 

and obtain a confidence interval for /3o. (We do not discuss here the theoretical 
issues related to this practical implementation.) 

Remark 4.20 (The linear regression model) We consider separately the 4 es- 
timating equations introduced in Examples 12.1112.41 in the case of the linear 
regression model (i.e. /x(x) — x). 

1. (Working independence) The equation introduced in Example 12 .11 has the 
solution 

and asymptotic covariance matrix HJf^^P — X]"=i ^f^i- 

2. (GEE) The equation introduced in Example 12.21 has the solution 

^GEE ^ j"^ xfR^iay^x) (j2 Xf R.(a)- V.^ 

and asymptotic covariance matrix H^^^ = J2'i=i^I'^i{'^)^^^i- (The matrix 
Ri(a) is supposed to be known.) 

3. (Pseudo-likelihood equation) Let TZq — TZi — I and 



^^- '■= I E(y' - ^^p';:'"'niy^ - x./sj^'^^p)^, k = 2, 

4 = 1 

The equation introduced in Example 12.31 has the solution 

/^r" - (E Xf ^-.\X.^ i^± Xf 7^^_\y.^ 

and asymptotic covariance matrix Hj^'"^ = X]r=i Xf 7?.~_\Xi. 
4. (Asymptotic Quasi-Score) Let TZq = TZ\ — I and 

k 

i=l 
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The equation introduced in Example 12.41 is a polynomial of degree (2m)"~^ + 1 
in fj. We select (i^^^ to be the root of this polynomial, which is closest to 
^mdop^ This root cannot be written in closed form. The asymptotic covariance 
matrix of /J^QS jg g^QS ^ ^^^^ Xf 7^*_l(/3AQS)-lx,. 

The fact that the sequence {g^(/3)}n>i (given by Example 1 2. 4 p is an AQS 
tells us that, if n is sufficiently large, then for any fc = 1, . . . ,p, the asymptotic 
variance of /3j,„ (i.e. the (fc, fc)-element of the matrix H^*^^) is smaller than 

the asymptotic variance of each of P™n'^ , Pk^ ^^^ Pk^ ■ (Here, we used the 
notation /3„ — (/3i,„, . . . , Pp.-n)^ in each of the 4 cases.) 

A Formulas for the terms of T>^{/3) 

We write g;(/3) = g;^i(/3) + g^Af^), where 

n 

g;2(/3) = 5^XfA,(/3)l/27^*.l(/3)-lA,(/?)-l/2^^. 



Note that 
dg:,AP) 

where 



bW(/3) + B^^Hp) + B[fl(/3) - H„(/3) := B„(/3) - H„(/3) 
£W(/3) +£,?!(/?)+ 4^1 (;3):=£„(/3) 



H„(/3) - J2^fMpy/^^zuiPr'MPy^^^^ 

n 

bW(/3) = 5^Xfdiag{7^:_l(/3)-lA,(/3)-l/2[^,-;,,(/3)]}GW(/3)X. 

n 

B[?l(/3) = 5]xfA,(/3)V27e*_^(/3)-idiag{M.-A^.(/3)}Gpl(/3)X, 



1=1 



bS(/3) = ^XfA.(/3)V2 



?T 



7e*-i(/3)-^ 



A.(/3)-i/2[/i,-^,(/?)] 



41(/?) = ^Xfdiag{7^*.l(/3)-lA.(/3)-l/2e.}G^'(/3)X. 

n 

4^1(/3) = ^XfA,(/3)l/27^*_l(/3)-Miag(£0Gf(/3)X, 
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<]m - Y^^jj^m' 



/2 



^n-M-^ 



Mm 



-1/2 



£j. 



Here, the matrices G' {(3), G\ '(/?) are the same as in [S^, i.e. 






m"(xL/3) 



GfiP) 



diag 



2m'(xL/3)V2 



2M'(xfi/3)3/2"-" 2/i'(xL/3)3/2 



[31 [31 [31 

We denote by b^^ ; (/3) , e„ ; (/3) the Z-th column vectors of B„ (/?), respectively 
£^n\l3), for any 1 <l<p. 



B Proof of Lemma 14.61 

Clearly, rjn{r) < ■(/'«(''), where 



ipnir) := sup max 



A^'(x5/3') 



M'(x?:/3) 



1 



Let /3, /?' £ i?r be arbitrary. By Taylor's formula, there exists Pij between /3 



and /3' such that pH^^JS') - /i'(x^-/3) = m"(xP^,)x^-(/3' - /3). By (Ai?) 



m'(xS-/3') 



Ai'(x/,/3) 



- 1 



M"(xf,/3.,) 



M'(x/,/9) 



|x?:(/5'-/3)|<C 



M'(xi,/3.,) 



M'(x/,/3) 



,T/o/ 



(/3'-/3)|. 



(0),iiidcp 



Since |x^.(/?' - /3)|2 < ||x5.(Hr'=P)-^/'f • ||(Hr°P)^/'(/3' - /3)f < 7- 
A„iax(HJ;'^^P)r2 = a„r2, it follows that i,^(r) < C^^{r)a\l''r + CaM'^r, i.e. 

^„(r)(l-CayV)<CayV 

1 /n 

By assumption (i^), there exist ri > and rii > 1 such that an r < 1/(2C) 
for all r G (0, ri),n > m. Hence, '0n('') < Can r for all r G (0,ri),n > ni. D 

C Proof of Lemma 14.91 

We write H„(/3) - H„ = hL'1(/3) + h1?1(/3) + h[?1(/3), where: 

hW(/?) = X:xf[A,(/3)V2_Aj/^]7e:_,(/3)-iA.(/?)V2X. 

n 



i=l 



H[3](/3) = ^XfAl/^(K*_i)-MA.(/3)^/^-A,^/^]X, 



i=l 
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Let A be an arbitrary p x 1 vector with ||A|| — 1. By the Cauchy-Schwartz 
inequality, |A^hL^'(/?)A| <Ti{/3,Xy/^T2{l3,X)^^^, where 



Ti(/3,A) := VA^Xf A,(/3)l/27^*_^(^)-lA,(/3)l/2x,A < maxA^,ax[7^*_l(/3)-^] 

^ — ^ i<n 

maxAL.[Ar'/'A,(/3)i/2] • A^HJJ'^^pA < C^„(r)A,„ax(HJJ''^P) 

n 

T,(AA) := ^A^Xf[A.(/3)V2-Ay^]K_i(/3)-i[A.(/3)V2-Ay^]X.A 

< max A,„ax[7^*_l(/3)-l] max ALx[Ar'^'A,(/3)i/2 - I] 

maxAL.[Ar'/'A,(/3)V2] . A^Hj^^dcp^ < C^„(r)r;2(r)A.„ax(Hj^^'^<=P). 



For estimating the terms above, we used {R'), (HS)) and Lemma WM Hence, 

|A^HW(/3)A| < CA„,ax(Hi,"'i°P)^„(r)77„(r). 
Note that |A^h[?1(/3)A| < T{(A)i/2r^(/3, A)!/^ where 

n 

TiiX) := 5]A^XfAy^(7e*_i)-iAy2x,A<CA„,ax(HJ,"'i°P) 

1=1 

n 

T^(/3,A) := 5]A^XfA,(/3)V2(7^*_J-V2[(7e*_^)l/27e*_^(^)-l(7^*_Jl/2 _i]2 

(7e*_i)-'/'A,(/3)i/2x,A 

< max A,„ax[(K*_i)^/'K*_i(/3)-i(K*_i)i/2 _ j]2 ^laxA,„ax[(7^*_l)-l] 

'i<n ^^"^ 

maxA,„ax[A7'/'A,(/3)i/2] • A^HJ^^'^'^pA 

< Cp2(r)A,„ax(HJ."d^P). 

Hgiicg 

|A^HL2l(/3)A|<A^ax(HJ^^'^'=P)p„(r). 

Similarly, one ca prove that: 

|A^H[fl(/3)A|<A„ax(Hj,"'i<=P),7„(r). 
Since 7r„(r) > 1, by (C^) and (C2), 

limlimsupa"^/^^* sup sup |A^H[fl(/3)A| = a.s, 

r^O n^oo /3eBr l|A|| = l 

for fc = 1, 2, 3. It follows that 



lim limsupa^"'^'^ sup sup |A"^[H„(/3) — H„]A| = a.s. 
D 



'•^0 „^oc fJeBr l|A|| = l 
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D Proof of Lemma 14.101 

Let A be an arbitrary px 1 vector with || A|| — 1. Using the fact that diag(v)Du 
Ddiag(u)v, for any vectors u,v and for any diagonal matrix D, we write: 



A^bW(/3)A = 5^A^XfGfl(/3)diag(X,A)7e*_i(/3)-iA,(/3)-i/2[^,, _^,(;3)] 

n 

A^bP1(/3)A = ^A^Xf A,(/3)l/27^*_^(/3)-ldiag(X,A)Gfl(/?)k -M.(/3)]. 
By the Cauchy-Schwartz inequahty, |A^bW(/3)A| < h{l3,\f/'^l2{l3fl'^, where 

11 

h{P,X) := ^A^X,GW(/3)diag(X,A)7^*_l(/?)-Miag(X,A)GW(/3)X,A 

< maxA„,ax[7^*_l(/3)"l] max A„,ax[diag2(X,A)] maxXl^JA-'/^G^^\(3)] 

i<n ■i^^i ^^^ 

< C^„(r)a„A,„ax(HJ;'d^P) 

< maxA„,axK_i(/3)-^]maxALx[A,(/3.)'/'A,(/3)-iA,(/3,)'/'] 
max A„,ax[A-iA,(/3,)] • (/? - /3o)^HJ,"'i°P(/3 - /?o) 

< Cr2^„(r)A„,ax(HJ;^''<=P). (53) 

For estimating the term l2{(3), we used Lemma |4.6[ (|48)) . and the Taylor's 
formula: ^i(/3) — ^^ = Ai(/3i)Xi(/3 — /3o), where /3i is between /3 and /3o- For 
/i(/3, A), we used (gl]), (Aif), Lemma IT^ and the fact that 

Amax[diag^(XiA)] < a„ for aU i < n, (54) 

(To prove dMD, note that jx^Ap < A„,ax(Hj,"dcp)|x^.(H»dep)-ix^^.| < 
A^ax(H:fdcp)^^o),indep ^ ^^_^ Therefore 

|A^BW(/3)A| < Cray2^„(r)A,„ax(HJr'*°P). 
Using condition (Ci), it follows that 

limlimsupa;;;^/^"'' sup sup |A'^BW(/3)A| = a.s. 

The term B„ (/?) is treated by similar methods. More precisely, jA-^Bn (/3)A| < 
/((/3,A)1/2/^(/3)1/2, where 



I[{p,X) := ^A^XfA,(/3)l/27^*_,(;3)-ldiag(X,A)Gfl(/3)A,(/3)l/27^*_, 
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A,(/3)l/2Gfl(/3)diag(X,A)7^*_l(/3)-lA,(/3)l/2x,A 

< CttI (r) max Xl,^[Gf^ (/3) A, (/?)Gpl (/3)] max A„,ax[diag'(X, A)] 

maxA„,ax[A-iA,(/3)] • X'^U'^^^pX 

< Cr2A„,ax(HJ,'^'*<=P). 

To estimate /((/?, A) we used (gS]) , dMl) , Lemma SH (i?') and (AiJ). The 
estimate of l2iP) was obtained similarly to (|53p . using (i?'). 
By condition (Ci), we conclude that: 

limlimsupa,;^/^"'' sup sup |A'^B,[fl(/3)A| = a.s. 

r^O n^oo /3e-Br ||A|| = 1 

To treat the term which involves B„ (/3), we note that |A'^B„ (/3)Ap < 
ELi l^^b|fl;|^ where b[fl,(/3) denotes the l-th column of Bk^l(/3). By Theorem 
9.2, ^, we have: 



a/3, 



■7^*_l(/3)-^--7^,r_l(/3)-l 



df3, 



■nuw 



T^uipy'- 



(55) 



Now, for any I e {!,..., p} fixed, we have: lA^b'^l < /;'_;(/3, A)i/2/2(/3)i/2, 
where /2(/3) is as above and 



/"z(/3,A) := E^^X^A,(/3)l/27^*_l(/3)-l 



^^^um 



T^-iipr' 



^7^:_,(/3) 



7^:_l(/3)-lA,(/3)l/2x,A 

r d 

< niaxA^^ax TT^T^t-iiP) 

^Tjjindcp;^ 

< C7r3(r)g2(r)A„ax(HJ,"<^^P), 



max ALx[7^ti (/?)"'] max A,„ax[A, '/'A,(/3) 

2<n i<n 



1/21 



usmg 



and Lemma l4.6l Hence, 

|A^b[fl I < Cr7r2(r)q„(r)A.„ax(Kr'=P). 



Finally, (C3) implies that lim,.^o limsup^^^^ |A^b^ 'j = a.s., and therefore 
limlimsupa;;^/^^'^ sup sup |A^B[fl(/3)Al = a.s. 

D 
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E Proof of Lemma 14.111 

Let A be an arbitrary pxl vector with || A|| = 1. We first treat the terms £n (/?) 

[21 

and £n'{P). Using the fact that diag(v)Du = Ddiag(u)v, for any vectors u, v 
and for any diagonal matrix D, we write X'^£n'{l3)X — X]fe=i UL (/?, A) and 
A^£:['l (/3)A = EL4 Uk'' (/?, A), where 

n 

f/W(/3,A) = ^A^XfGWdiag(X,A)7^*_l(/3)-lA,^l/2,^ 

n 

C/P1(/3,A) = ^A^XfGW(/3)diag(X,A)7e*_i (/?)-! (A-^/^/3)-A-^/^)e, 

n 

UlS(3,X} = ^A^Xf(GW(/3)-GW)diag(X,A)7e*_i(/3)-iA-i/^£, 

n 

C/W(/3,A) = ^A^XfAy^7e*„i(/3)-Miag(X,A)Gfl£. 

n 

C/[^l(/3,A) = ^A^Xf(Aj/2(/3)-Aj/^)K_i(/3)-Miag(X,A)Gfl(/3)£. 

n 

C/^l(/3, A) = ^ A^Xf Ay27e*_i(/3)-Miag(X,A)(Gfl(/3) - Gf% 

1=1 

Note that {[/„ (/?, A), J>^}„ is a martingale (with respect to P/3(,), and hence 
{sup^g^^ suppii ]^ |J7„ '(/3, A)|, JFnjn is a submartingale (with respect to Ppg), 
for any r > 0. 

In what follows, we will prove that for any r > 0, there exists a constant 
C > (depending on r) such that 

£ sup sup |[/J,''1(/3,A)| <C, Vn>l. (56) 

l3eBr ||A|| = 1 

By the martingale convergence theorem (Theorem 2.5, [TO]), it will follow that 
{sup^g^^ suppil^]^ \Un {P,X)\}n converges a.s. Using the fact that q„ -^ 00, 
we obtain that: for any r > 0, 

lim a-^/"^'^ sup sup |t/^'=l(/3, A)| = a.s., fc = l,...,6, 

n^oo l3eB,\\\\\ = l 

from which it will follow that 

lim a-^^^-^ sup sup \X^£l^H(3)X\ = a.s., fc = l,2. 

We now turn to the proof of (j56p . By the Cauchy-Schwartz inequality 
\UL'\P,X)\ < Ji{I3,XY/^jI'\ where J2 := Er=i efA-^e, and 

n 

Ji(/3,A) := ^A^XfGfldiag(X,A)7^*_l(/3)-2diag(X,A)GWx,A 
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< max ALx[7^*-l (/?)"'] max{A„,ax[diag2(X,A)]} max ALJA-^/'gW] 

z<n i<n 2<.n 

For the estimation of the term Ji(/3, A), we used (|48)) . ((54l) and (AiJ). 
Note that E{efA'[^ei) = trRj = m for aU i, and hence 



E{J,) = e[J2 ef Ari£, = mn. (57) 



We conclude that 



ii;sup sup |C/W(/?,A)| < {i?sup sup Ji(/3,A)}i/2{£;(j2)}i/2 

/3eB,. ||A|| = 1 l3eB^\\X\\ = l 

Similarly, we find the upper bound C{S[7r2(r)]na2 Amax(HJ,"<^'=P)}i/2 for the 

term Ssup^g^^ suppn^]^ \Un (/3, A)|, with k = 2,3. For this, we use the follow- 
ing fact: for any r > 



sup max 



A."(x?:./3) A^"(x^./3o) 



M'(x^/3)V2 ^'(x^./3o)l/2 



A* 



'(x^/3o)-^/^ < Crai/' 



(This inequality can be proved using Taylor's formula.) 

Relation ([SS]) with k — 1, 2, 3, follows from (C4). 

We now treat the term involving ?7„ (/3, A). By the Cauchy-Schwartz in- 
equality, \ul^\(3,X)\ < J3(/3,A)i/2j4(/3,A)i/2, where 

n 

MP,X) := ^A^X,Ay27^*_l(/3)-Miag^(X,;A)7^*_l(/3)-lAy'x,A 



i=l 



< max{A„ax[diag2(X,A)]} maxA,„ax[7e*_i(/3)-'] ■ A^HJ^^^^PA 

< C7r2(r)a„A^ax(HJ,"d^P) 

n p 77, 

J4(/3,A) := E^^KO e.<niax[Al/^Gpl]5:£fA-ie. 

n 

< C^efA-ie,, 

and we used (l48|) and {AH) . By ([57|) . -Esup^g^^ sup^n^^ >^4(/3, ^) < Cn, and 
hence 

£; sup sup |C/W(/?,A)| < {£; sup sup J3(/?,A)}i/2{^ sup g^p j^(^f]^x)y/^ 

l3eB,^\\\\\=i i3eB,^\\\\\=i i3eB,^\\\\\=i 

< C{E[7rl{r)]na,An..A'iin''°nV^^- 
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Similarly, we find the upper bound C{E[Kl{r)]nalX^^^[U^'^''P]y^'^ for the 
term iJsup^g^^ sup||;^|i j^ |C/„ (/3,A)|, with k = 5,6, using the fact that: Vr > 



sup max 



A."(x^./3) A*"(x^./3o) 



A.'(x?:./3)3/2 ^'(x?:./3o)3/2 



A^'(x^/3o)-^/^ < Crair^ 



Relation ([56| with fc = 4, 5, 6, follows from (C4). 

It remains to treat the term £Jfl(/3). Note that |A'^£:]f' Ap < ^f^^ |A'^e|fl;(/3)|2, 

where e^^^(/?) denotes the l-th column of £n\P). For any 1 < / < p, we define 

t/S(/3,A) = A^e3(/3). 

Note that {Ul J(/3, A), J^„}„ is a martingale (with respect to Ppo), and hence 

{sup^g^^ suppil^;^ \u!^\{P,X)\,Tn}n is a submartingale, for any r > 0. Using 
the same argument as above, to conclude that for any r > 0, 

li-ai a-^/^-^ sup sup |A^4^1(/?)A| = a.s., 

n-oo l3eBr\\X\\ = l 

it suffices to show that 

lim a-^^^-^ sup sup |C/f)(/?,A)| =0 a.s.. 

n-oo /3eS, ||A|| = 1 

For this, it is enough to show that: for any r > 0, there exists a constant C > 
such that 

A"^] f a \\\ ^ r' \-/„ ^ 1 (58) 



E sup sup \Ul;\{l3,X)\ <C, Vn > 1. 

/3eS, ||A|| = 1 



By the Cauchy-Schwartz inequality, |C/l^^l(/3, A)| < .h{P,\Y/'^,h{l3,\Y 
where 



/2 



J5(/3,A) := ^A^XfA,(/3)l/27^*_,(/3)-l 



1=1 



d 



dl3 



■T^UiP) 



dPi 

7^*_l(/3)-lA,(/3)l/2x,A 
d 



K-i(/3) 



7e*-i(/3)- 



< maxAniax[7^,*_i(/3) '^JniaxA^ 



i<n 



i<7i 



dl3i 



T^um 



maxAmax[A, ^Ai(/3)] 

i<n 



< C<(r)q^(r)A„ax(HJ,"'^<=P) 



M/3,X) := V£fA,(/3)-ie, <maxA,„ax[Ay'A,;(/3)-iAy']VefA-ie, 

-■^^ — ^ i < n ^ — ^ 

n 

< C^efArie,, 
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and we used ([55]) . (AH) and (gS]) and Lemma l4.6l By ([57| . i^sup^^^^ sup||;y||^i J(,{(3,\) < 
Cn, and hence 

S sup sup |C/i^k/3,A)| < {^ sup sup J^{(5,\)Y'^{E snv sup J6(/3,A)}i/2 

/36S, ||A||=1 ' /3e-B,. |1A|| = 1 /36B, ||A|| = 1 

Relation ([55)1 follows by condition (C5). D 
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